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;es  properties  associated  with  a  simple  yet  effective  way  to  exploit  parallel 
screte  event  simulations:  averaging  the  results  of  multiple,  independent  replications 
that  are  run;  in  parallel,  on  multiple  processors.  We  focus  on  estimating  expectations  from  termi¬ 
nating  sirnulations,  or  steady  state  parameters  from  regenerative  simulations.  We  assume  that  there 
is  a  GPtr  time  constraint,  l,  on  each  of  P  processors.  Unless  the  replication  lengths  are  bounded, 
one  must  be  willing  to  simulate  beyond  any  fixed,  finite  time  t  on  at  least  some  processors  in  order 
to  always  obtain  a  strongly  consistent  estimator  (as  the  number  of  processors  increases).  We 
therefore  consider  simulation  experiments  in  which  t  is  viewed  as  either  being  a  strict  constraint, 
or  a  guideline,  in  which  case  simulation  beyond  time  t  is  permitted.  The  statistical  properties,  in¬ 
cluding  strong  laws,  central  limit  theorems,  bias  expansions  and  completion  time  distributions,  of 
a  variety  of  estimators  obtainable  from  such  an  experiment  r  -:erived.  We  propose  an  unbiased 
estimator  for  a  simple  mean  value.  This  estimator  requires  pre-selecting  a  fraction  of  the  processors. 
Simulation  beyond  time  t  may  be  required  on  a  pre-selected  processor,  but  only  if  no  replications 
have  yet  been  completed  on  that  processor,  /•" 
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1.  Introduction 


Discrete  event  simulations  often  require  large  amounts  of  computer  time  in  order  to  produce  sta¬ 
tistically  accurate  estimates.  This  is  particularly  true  of  queueing  network  models  of  manufacturing, 
communications  and  computer  systems.  Such  simulations  therefore  represent  an  important  po¬ 
tential  application  for  parallel  processors.  Distributed  simulation,  or  the  execution  of  a  single  re¬ 
alization  of  a  stochastic  process  on  multiple  cooperating  processors,  has  recently  been  the  focus  of 
a  good  deal  of  research.  Fujimoto  [11]  contains  an  excellent  introduction  to  this  topic  including 
a  discussion  of  why  distributed  simulation  is  difficult,  a  description  of  a  variety  of  synchronization 
techniques,  and  a  literature  review.  For  further  surveys  and  a  representative  sample  of  research  in 
this  area  see  [27],  [31],  [29],  [35],  or  [36].  While  significant  speedups  have  been  achieved  in  dis¬ 
tributed  simulations  of  specific,  specially  structured  queueing  systems  (see,  e.g.,  [10],  [17],  [24], 
[28],  or  [38]),  in  our  opinion,  distributed  simulation  has  not  yet  proven  to  be  an  effective  general 
purpose  technique  for  the  type  of  complex  models  that  typically  arise  in  practice. 

However,  there  is  a  simple  alternative  to  distributed  simulation  that  easily  takes  advantage  of  par¬ 
allel  processing  technology:  running  multiple  independent  replications  of  the  model,  in  parallel, 
on  multiple  processors  and  averaging  the  results  of  at  the  end  of  the  runs.  The  method  can  po¬ 
tentially  be  applied  to  any  model  and  does  not  require  advanced  parallel  processing  hardware,  e.g., 
it  can  be  used  on  a  collection  of  workstations  attached  to  a  local  area  network.  Heidelberger  [18] 
developed  a  simple  model  to  compare  the  statistical  efficiency  (specifically  the  mean  squared  errors) 
of  these  two  approaches  for  estimating  so-called  steady-state  quantities.  This  analysis  showed, 
qualitatively,  that  the  parallel  replications  approach  is  statistically  more  efficient  than  distributed 
simulation  provided: 


1.  The  model's  memory  requirements  are  small  enough  so  that  it  can  reasonably  fit  into  the 
memory  of  a  single  processor. 


2. 


3. 


The  model  can  be  run  long  enough  on  a  single  processor  so  that  initialization  bias  is  not  sig¬ 
nificant  (compared  to  the  standard  deviation). 

A  main  reason  why  the  model  must  be  run  for  long  periods  of  time  is  the  slow  rate  at  which 
the  standard  deviation  decreases. 


We  believe  that  these  conditions  are  satisfied  for  many  queueing  models,  e.g.,  networks  in  moderate  ~ 

to  heavy  traffic  with,  say,  up  to  hundreds  of  queues:  such  systems  are  difficult  to  simulate  primarily  _____ 

because  the  standard  deviations  of  the  point  estimates  are  typically  large  (see,  e.g.,  [37]).  As - 

Codes 

technology  advances,  and  processors  become  faster  and  memories  larger,  we  expect  the  class  of  ad/or  ~ 
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models  suitable  for  the  parallel  replications  approach  to  become  ever  larger.  Further  statistical 
properties  associated  with  this  approach  for  steady  state  estimation  are  considered  in  [14]  and 

[15]. 

When  one  considers  the  estimation  of  quantities  arising  from  so-called  transient,  or  terminating, 
simulations,  the  parallel  replications  approach  appears  to  be  even  more  attractive.  Examples  of 
such  quantities  are: 

1 .  The  expected  time  until  a  queue  length  first  exceeds  some  level  (given  a  prespecified  set  of  in¬ 
itial  conditions). 

2.  The  expected  number  of  customers  that  can  be  served  in  a  fixed  time  interval  (again,  given  a 
prespecified  set  of  initial  conditions). 

3.  The  mean  time  to  failure  in  a  reliability  model. 

4.  The  expectation  of  an  integral,  or  sum,  over  a  cycle  in  a  regenerative  process  (see  [33]),  e.g., 
the  integral  of  a  queue  size.  In  this  case  a  replication  is  associated  with  a  regenerative  cycle 
and  the  parallel  replications  approach  for  estimating  transient  quantities  can  be  thought  of  as 
a  parallel  regenerative  method  (see,  e.g.,  [7])  for  estimating  steady-state  quantities. 

Intuitively,  one  should  be  able  to  just  turn  on  the  parallel  processor  for  some  period  of  time,  say 
t,  and  average  the  resulting  observations  at  the  end  of  the  run.  For  a  large  number  of  processors, 
one  should  be  able  to  make  t  small,  thereby  running  only  a  few  replications  on  each  processor. 
Thus  highly  accurate  estimates  should  be  obtained  in  a  very  short  period  of  time. 

However,  there  are  some  potentially  serious  statistical  problems  with  the  parallel  replications  ap¬ 
proach,  especially  for  a  large  number  of  processors.  These  problems  arise  mainly  because  of  the 
sampling  bias  associated  with  the  fixed  completion  time  t.  First  of  all,  what  should  one  do  with 
the  replications  that  are  in  progress  at  time  f?  Should  they  be  discarded,  or  allowed  to  complete? 
Secondly,  how  should  one  average  up  the  resulting  observations?  There  are  several  ways  this  can 
be  done.  Does  it  make  a  difference?  What  are  the  statistical  properties  of  the  resulting  estimators? 
This  paper  studies  these  and  related  questions. 

In  the  case  of  a  single  processor,  these  issues  were  investigated  by  Meketon  and  Heidelberger  [25], 
who  showed  that  under  certain  circumstances  it  is  better  to  complete  the  replication  in  progress  at 
time  t.  Specifically,  if  t  is  measured  in  units  of  simulated  time,  then  in  the  case  of  ratio  estimation 
in  regenerative  simulation,  the  bias  gets  reduced  from  order  1  jt  to  Ijt  by  completing  the  regener¬ 
ative  cycle  (replication)  in  progress  at  time  t.  In  the  parallel  processing  setting,  these  (and  other) 
issues  were  addressed  by  Heidelberger  [19]  who  showed  that  some  of  the  most  obvious  estimates 
obtainable  from  parallel  replication  schemes  are  guaranteed  to  produce  incorrect  results,  in  the  sense 
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that  they  converge  to  the  wrong  quantity  with  probability  one  as  P,  the  number  of  processors,  in¬ 
creases.  In  [19],  other  estimates  with  correct  convergence  properties  were  proposed  and  analyzed. 
Associated  with  these  estimators  is  a  completion  time  penalty  that  arises  because  all,  or  some,  of 
the  incomplete  replications  must  be  allowed  to  finish  in  order  to  reduce  or  remove  the  bias.  A 
subsequent  paper  by  Glynn  and  Heidelberger  [13]  revisited  the  single  processor  case,  obtaining  finer 
bias  expansions  for  a  variety  of  estimators  and  relating  these  expansions  to  the  bias  reducing  tech¬ 
nique  in  [25].  Other  issues  related  to  parallel  replication  schemes  have  also  been  analyzed  by 
Bhavsar  and  Isaac  [1]. 

The  present  paper  explores  the  parallel  processing  implications  of  the  results  in  [13].  We  generalize 
and  improve  upon  the  estimators  suggested  in  [19].  The  generalization  permits  more  than  just  ratio 
estimation  and  the  improvements  include  new  estimators  with  shorter  completion  times.  In  addi¬ 
tion,  whereas  [19]  considers  asymptotic  behavior  as  either  t  -*■  oo  or  P  -» oo,  we  analyze  situations 
in  which  both  i  and  P  approach  oo  simultaneously.  This  allows  us  to  determine,  for  example,  the 
relative  rates  at  which  t  and  P  must  grow  in  order  to  obtain  valid  confidence  intervals  when  one 
discards  all  replications  in  progress  at  time  t.  Since,  in  reality,  we  never  actually  have  an  infinite 
number  of  processors,  these  results  should  be  interpreted  as  determining  how  large  t  needs  to  be, 
qualitatively,  for  a  given,  large,  number  of  processors. 

The  paper  is  organized  as  follows.  In  Section  2,  we  introduce  notation  and,  in  the  interest  of 
keeping  the  paper  self-contained,  review  the  most  relevant  results  from  [13]  and  [19].  Section  3 
considers  the  estimation  of  a  simple  mean  value,  while  Section  4  considers  the  estimation  of  a 
nonlinear  function  of  a  vector  of  simple  means,  e.g.,  a  ratio  of  two  mean  values.  Completion  time 
results  associated  with  the  various  estimators  are  derived  in  Section  5,  and  the  results  are  summa¬ 
rized  in  Section  6. 
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2.  Notation  and  Review 


We  let  P  denote  the  number  of  processors.  We  assume  that  processors  are  identical  and  that,  si¬ 
multaneously,  each  processor  runs  multiple  replications  of  the  simulation.  The  output  of  repli¬ 
cation  j  on  processor  i  is  a  random  variable  (r.v.)  The  goal  of  the  simulation  is  to  estimate 
/ 1  =  EfAjj].  We  let  ?i}  denote  the  (random)  amount  of  (computer  or  reai)  time  required  to  run  rep¬ 
lication  j  on  processor  /  and  let  5)(>i)  =  t;-|  -f  —  4-  rin  be  the  time  that  it  takes  processor  /  to  complete 
n  replications  (also  let  5,(0)  =  0).  In  a  simulation  of  length  t,  processor  i  can  complete  N-t  (/)  repli¬ 
cations  where  N-t (t)  =  sup(  n>0:  St(ri)  <  l }.  We  make  the  reasonable  assumption  that 
{ (A'jj, Tjj), /  =  1,...  ,P,j>  1 }  are  i.i.d.  (independent  and  identically  distributed)  r.v.'s.  Under  this 
assumption  {/V-  (/),  t  >  0),  i—  1 . P  are  i.i.d.  renewal  processes  (see  [5]  or  [34]). 

Notice  that  there  are  many  possible  ways  to  estimate  fj.  from  such  an  experiment.  One  could  esti¬ 
mate  ii  based  on  simulating  a  fixed  total  number  of  replications,  or  based  on  a  completion  time 
constraint  /  as  in  the  above  setting.  While  other  stopping  rules  are  also  possible,  we  will  only 
consider  estimators  based  on  a  completion  time  constraint,  which  represents  a  realistic  and  practical 
method  for  running  such  parallel  replication  schemes.  In  [19]  a  variety  of  such  estimators  for  p 
were  considered  and  analyzed.  The  first,  and  perhaps  the  most  obvious,  thing  to  do  is  simply  av¬ 
erage  all  of  the  observations  that  have  completed  by  time  /.  This  results  in  the.  following  estimate: 


M,(/V)  = 


p  m 

zi>, 

;=•> 

p 

I>«) 

(=i 


(2.1) 


In  [19],  it  was  shown  that  while  lim  £1(/\  i)  —  n  almost  surely  (a.s.),  lim  pAP,  t)  is  typically  not 

/-*  oo  P-*oo 

equal  to  /x,  in  fact  lim  ^(P,  t)  =  fi  +  0(1//)  a.s.  In  other  words,  if  one  attempts  to  estimate  n  by 

P-*oo 

running  a  very  large  number  of  processors  for  a  short  amount  of  time,  then  the  estimate  need  not 
converge  to  fi.  On  the  other  hand,  if  one  completes  all  the  replications  in  progress  at  time  /,  then 
the  estimate 


/Wv) 


p  h'ii 0+1 

z  z  * 

(=1  j=  1 


I 
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has  the  property  that  lim  £2(P,  t)  =  lim  £2(P,  0  =  M  a.s.  The  difference  in  asymptotic  behavior 

f— oo  P— oo 

(as  P  -+  co)  between  $.\(P,  /)  and  /^(P,  t)  is  due  to  the  ratio  form  of  the  estimates  and  the  fact  that 
(N;  (l)  -f  1)  is  a  stopping  time  and  therefore  by  Wald's  equation  (see  page  186  of  [20]),  the  limit  of 
1  IP  times  the  numerator  of  Equation  2.2  converges  to  E[A'-  (/)  -f  l]E[A’jj]  while  1/P  times  the  de¬ 
nominator  converges  to  E[A'(-  (t)  ~  1].  Thus  the  limiting  ratio  is  independent  of  t  and  produces  the 
desired  result.  Since  N-t  (/)  is  not  a  stopping  time,  this  argument  cannot  be  applied  to  £i(P,  t)-  The 
price  to  be  paid  for  this  consistent  estimation  of  fi  is  an  increased  completion  time,  which  was 
shown  to  grow  as  order  ln(P)  in  [19]  under  a  variety  of  distributional  assumptions  on  . 

This  discussion  shows  that  for  the  estimators  described  above,  one  must  simulate  beyond  time  t  in 
order  to  obtain  strong  consistency  as  P  ->  co  We  next  show  that  no  matter  what  estimator  is  used, 
one  cannot  expect  to  always  get  the  right  answer  by  simply  setting  a  fixed,  finite  completion  time 

A 

i  and  “throwing  processors”  at  the  problem.  More  specifically,  suppose  t  is  given.  Let  O^P,  t )  be 
any  estimator  for  that  can  be  constructed  from  information  obtained  in  the  interval  [0,  t], 

A  A 

i.e.,  0x(P,  t )  is  a  function  of  {(Xn,  ~n),  j  <  N-t  (t)  ,i  —  1 , ... ,  P}.  We  require  O^P,  t)  to  be  a  universally 

A 

valid  estimator  in  the  sense  that  lim  O^P,  l)  =  E[A'1J]  a.s.  regardless  of  the  distributions  of  X„  and 
Tjj.  Suppose  now  that  >  /}  >  0  and  define  if  TtJ  <  t  and  -  X]}+  l  if  >  t.  Then 

E[A'1J]  ^  E[  y,j],  however  d^P,  t)  =  0y(P,  t)  since 

{(Xiy  Tjj),  j <  (t) ,  /=  1, ... ,  P)  =  {(Tjj,  Tjj),  i<Ni(t)  ,/=l . P}.  Therefore 

E[Xn2  =  lim  6x(P,  t)  =  lim  Oy(P,  t)  ^  £[/,,],  i.e.,  6y(P,  t)  is  not  consistent  for  (L,:,  tk).  Thus  no 
such  universal  estimator  exists,  and  one  must  be  willing  to  simulate  beyond  time  i  on  at  least  some 
processors  in  order  to  obtain  a  universally  valid  estimator.  This  paper  will  define  and  analyze  the 
properties  of  such  a  class  of  estimators. 

Before  defining  these  estimators,  we  need  to  review  some  results  from  [13]  for  the  case  of  a  single 
processor.  To  prevent  introducing  new  notation,  we  keep  the  processor  subscript  i,  even  though 
it  is  not  needed  in  the  rest  of  this  section.  Define  A)(0)  =  0  and  for  any  n>  I,  define 

_  n  _ 

X;  (n)  =  H  In.  For  the  case  of  a  single  processor,  the  properties  of  Xt  ( A',-  (t))  were  studied  in  detail 
7=1 

in  [13].  The  basis  for  determining  these  properties  is  the  relationship 


E[A'f  [Ntm  =  E[A),  ;  <  0  =  p  -  E[Xn  ;  xn  >  Q  (2.3) 

where,  for  a  real- valued  r.v.  Y,  E[L  ;  A]  denotes  E[L  1(A)]  and  1(A)  denotes  the  indicator  of  the 

event  A.  This  relationship  depends  on  the  fact  that,  given  A,i-  (t)  =  k,  (Xn . Xik)  are  exchangeable 

r.v.s.  Equation  2.3  has  appeared  in  Pathak  [30]  and  Kremers  [21]  in  the  context  of  survey  sam¬ 
pling  from  a  finite  population.  Kremers  also  states  the  result  for  the  so-called  infinite  population 
case  which  corresponds  to  our  probabilistic  setting  (for  a  single  processor).  A  special  case  of 
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I 

Equation  2.3  when  Xt]  =  t,  appears  on  page  93  of  Ross  [32],  From  Equation  2.3,  bias  expansions 
can  be  obtained.  For  example  (Corollary  2.5  of  [13]),  if  E[  I  XtJ  |  ]  <  co  and  E[  | XiJTij>  |  ]  <  oo  for 
some  jD  >  0,  then  E[A)  (Af)  (t))]  =  a*  +  o(Cp).  Similarly  (Corollary  2.6  of  [13]),  if  E[  |  X^e°~'>  |  ]  <  oo 
for  some  6  >  0,  then  E[A'/  (N-t  (f))]  =  fi  -j-  o(e~0r). 

Equation  2.3  also  suggests  an  unbiased  estimate  of  fj.  as  follows.  Defining  N,  ( i )  =  max(l,  Nt  (/)), 

then  E[Xt  (Nt  (.'))]  =  M-  In  order  to  form  this  estimator,  one  must  complete  the  replication  in 

1  progress  at  time  t  only  if  no  replications  have  yet  been  completed  (i.e.,  if  N,  (l)  =  0,  or  equivalently, 

i  ~  _  _  ~ 

zn>  t).  Strong  laws  and  central  limit  theorems  for  both  Xt  (.V,  (t))  and  X,  (Nt  (l))  follow  directly  from 

standard  results  in  probability  and  renewal  theory:  For  example, 

;  ^  ( xtm o)-m  )  -  (2-4) 

as  l  -*  oo  where  =>  denotes  convergence  in  distribution,  \'(a,b)  denotes  a  normally  distributed  r.v. 

!  2 

;  with  mean  a  and  variance  b,  and  c  is  the  variance  of  Xtj  (assumed  finite).  Drawing  on  the  results 

of  Chow,  Hsiung  and  Lai  [3],  uniform  integrability  and  moment  convergence  associated  with  these 
!  central  limit  theorems  are  given  in  [13].  For  example,  Theorem  4.2  of  [13]  states  that  if 

i  E[  |  Xl}  1 2p'rl+<5]  <  oo  and  E[T,j2p+6]  <  oo  for  some  p>  0  and  <5  >  0,  then 

i 

j  Km  E[ !  V?  ( X,  (N,  (0)  -  M )  \P1  =  E[  I  JV(0,  ^E^])  K  ]  .  (2.5) 

In  addition,  multidimensional  versions  of  the  central  limit  theorem  in  Equation  2.4  are  also  valid. 
These  can  be  combined  with  Taylor  series  expansions  and  the  uniform  integrability  of  Equation 
i  2.5  to  obtain  central  limit  theorems  and  bias  expansions  for  nonlinear  function  estimation. 
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3.  Parallel  Estimators  for  a  Simple  Mean 


We  now  build  on  the  results  described  in  the  previous  section  to  derive  and  analyze  three  alternative 
parallel  processing  point  estimators  for  a  simple  mean.  The  first  estimator,  has  the  property 
that,  like  /IfP,  i),  it  can  be  formed  at  exactly  time  t.  We  define 

P 

m)  =-jr  (3.i) 

i=! 

Notice  that  while  Ji(P,t)  and  $.{(P,  t)  both  make  use  of  the  same  underlying  observations,  these 
observations  are  combined  differently.  Our  first  result  concerns  the  expected  value  and  convergence 
properties  of  f{P,i). 


Proposition  3.1 

If  0  <  t(]  <  oo  a.s.  and  if  E[  I  X{j  \  ]  <  oo,  then 

1.  lim/I(P,0  =  At  a.s. 

t-+oo 

2.  If  there  exists  a  finite  constant  B  such  that  <  B  a.s.  and  if  B  <  t,  then 
Et/IfP,/)]  =  p.  and  lim  Ji{P,t)  =  n  a.s. 

P—  oo 

3.  IfE[UijTjj*|]<oo  for  some  k  >  0, 

then  E[)Z(/V)]  =  n  +  o(t~k)  and  lim  Ji{P,t)  =  n  +  o (t~k)  a.s., 

P— oo 

i.e.,  lim  t^lim  \ji{P,t)  —  n\  =0  a.s. 

t—*oo  p-+  OO 

4.  If  F[  |  X^e0*'1 1  ]  <  oo  for  some  6  >  0,  then 

E[/I(P,0]  =  m  +  o(e~0t)  and  lim  JL(P  t)  =  n  +  o(e~°‘)  a.s. 

P-*oo 

5.  If  E[iA’I1l 1+6]  <  oo  for  some  6  >  0,  and  if  lim  tP  =  oo,  then 

1  P-*oo 

Ji(P,  tp)=>  as  P  -*  oo. 

Proof:  Result  (1)  follows  by  ordinary  strong  laws  for  cumulative  processes  since  P  is  fixed.  Results 
(2),  (3),  and  (4)  essentially  follow  from  the  strong  law  of  large  numbers  and  Equation  2.3,  see 
Corollaries  2.1,  2.5  and  2.6  of  [13],  respectively.  For  (5),  by  Chebyshev's  inequality 
P{|/I(/>,  tP)  -  fi\  >  s)  <  E[|A)(fVi(/p))  -  mI]/s  which  converges  to  zero  by  Corollary  3.2  of  [13]. 
□ 

Since  Ji (P,t)  is  biased  for  finite  t,  we  next  define  an  unbiased  estimate  of  /x  in  the  parallel  setting  to 
be 
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(3.2) 


P 

HP .0  =~jr  £W/(0)  • 

/=i 

This  estimator  may  require  simulating  past  time  /  since  it  requires  completing  at  least  one  repli¬ 
cation  on  each  processor.  Specifically,  one  must  complete  the  replication  in  progress  on  processor 
i  if  and  only  if  N,  (t)  =  0.  The  convergence  properties  of  this  estimator  are  stated  next.  The  proof 
of  Proposition  3.2  is  not  given  since  it  is  basically  the  same  as  that  of  Proposition  3. 1 . 

Proposition  3.2 

If  0  <  -Tjj  <  co  a.s.  and  if  E[  I j  ]  <  oo,  then 

1.  E[?1(M3=/i> 

2.  lim  /L(P,t)  =  lim  £i(P,t)  =  pi  a.s. 

r— oo  p-,  oo 

3.  If  E[  I  A1,.  | 1+<5]  <  co  for  some  S  >  0,  and  if  lim  tP  =  oo,  then 

1  P—  oo 

as  P  -*  oo. 

The  third  estimator  we  consider  is  also  unbiased  but  has  a  shorter  completion  time  than  that  as¬ 
sociated  with  Since  E[/7(P,t)]  =  E[XnI(ri{  <  /)],  in  order  to  obtain  an  unbiased  estimate 

of  n  =  E[A”,j],  we  need  only  estimate  the  remainder  term  E[A'/1/(t(-1  >  f)].  Instead  of  using  all  P 
processors  (as  p.\(,P,t)  does),  we  use  P,  pre-selected  processors  to  estimate  the  remainder  term. 
Thus,  rather  than  requiring  that  at  least  one  replication  be  completed  on  all  processors,  we  only 
require  at  least  one  replication  on  the  P,  pre-selected  processors.  We  assume  that  these  pre-selected 
processors  are  labeled  /=  1, ... ,  P,.  Formally,  this  estimator  can  be  written  as 

pi 

?2(P,P„0  =  H(P,0  +  -J-  ^^>0  •  (3-3) 

1  i=  I 

Notice  that  if  P,  =  P,  then  HP>  P\<  0  =  Pi(^>0>  whereas  if  Pj  =  0,  then  (by  convention) 
HP>  P\’1)  —  p(P>0'  Proposition  3.3  describes  the  convergence  properties  of  (P>  P\>  0-  Note  that 
the  P1  processors  must  be  pre-selected:  an  unbiased  estimate  would  not  result,  e.g.,  by  taking  the 
first  Pj  uncompleted  replications  that  actually  do  complete. 

Proposition  3.3 

If  0  <  <  oo  a.s.  and  if  E[  I  A'j  I  ]  <  co,  then 
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1.  i&ziP,  P i>  0]  =  M> 

2.  lim  ^(P,  P\,i)  =  n  a.s. 

t-+oo 

3.  If  P  -» oo  and  Pi  ->  oo,  then 

th.{P>  p\>  0  i1  a-s- 

4.  If  E[  I  A).  1 1"r<5]  <  oo  for  some  <5  >  0,  iim  tp  =  co,  and  lim  Px=co,  then 

P-*  oo  P-+oo 

M2(P.  P] ,  =>  /i  as  P  -*•  oo. 

We  next  turn  to  central  limit  theorems  for  these  estimators.  Define  fit  =  E[A^/(t,i  <  /)], 
<r2(t)  =  Var[A'/(Afi-(0)].  and  S2(/)  =  Var[A)  (/V)  (/))].  We  begin  with  central  limit  theorems  for  Ji{P,i) 
and  £,(P,0  as  either  P  -*  oo,  or  t  ->  co.  These  are  applications  of  well  known  results  in  applied 
probability. 

Proposition  3.4 

If  0  <  <  oo  a.s.  and  if  a  <  oo,  then 

1.  If  o2(t)  <  oo  then, 

VP0i(P,0  -  m,)  “»  <t(/)/V(0,  1)  as  P  ->  co, 

2.  If  <r2(?)  <  oo  then, 

JP  (^i(P,0-M)=>  Sf(OW(0, 1)  as  P  ->  oo, 

3.  If  E[t,j]  <  oo  then, 

■J~Pt  (Ji(P,f)  -ft)**  <tE[tij31/2A^(0,  1)  as  /->  oo, 

4.  If  EOj,]  <  oo  then, 

JPtfa(P,Q-li)=>oEfcJl,2Nm  as  t —*  oo. 

In  order  to  obtain  central  limit  theorems  as  both  l  -*  oo  and  P  -*  oo  simultaneously  (triangular  ar¬ 
ray  central  limit  theorems),  we  first  note  that  if  E[  |  Ajj  1 5+<5J  <  oo  and  E[t,  4+<5]  <  co  for  some 
S  >  0,  then  by  Equation  2.5  (Theorem  4.2  of  [13]) 

lim  t  o 2(f)  =  lim  t  o2(()  =  Ef-r;:]  o'  .  (3.4) 

/-*  oo  f-»oo  1 

Proposition  3.5 

If  0  <  Tjj  <  co  a.s.,  a2  >  0,  E[  |  Xs  | 5+<5]  <  co,  0  <  E[Tjj4+‘5]  <  oo  for  some  S  >  0,  and  lim  tp  =  oo, 
1  1  ‘  P-*  oo 

then 

1.  (ji(P,tp)  -  fitp)  =>  o  E[T,j]i/2  /V(0,1)  as  P-»  oo, 
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2.  JPip  (  MP'h)  -p)~°  ECTij]1/2  iV(0,l) 


as  P  oo. 


Proof:  We  will  show  that  the  conditions  of  Lyapounov's  Theorem  (Theorem  7.3  in  [2])  are  satis¬ 
fied.  For  ?i (P,tp),  we  first  show  that  yfP  (JlfP.tp)  —  fi)/a(tp)  =>  A'(0,1)  from.which  the  result  fol¬ 
lows  by  Equation  3.4.  In  our  case,  Lyapounov's  condition  reduces  to  showing  that 

ECI^Wp))-^2**]  n  ,, 

P^  o-\lp)  (  } 

as  P  — *  co  for  some  small  c  >  0.  However  this  follows  by  multiplying  the  numerator  and  denomi¬ 
nator  of  Equation  3.5  by  2+£  and  applying  Equations  2.5  and  3.4.  The  proof  for  Ji(P,tp)  is 
similar.  However,  Lyapounov's  Theorem  applies  to  sums  of  r.v.'s  with  means  0,  which  accounts 
for  the  centering  term  being  ntp  rather  than  n-  0 

We  next  consider  when  the  centering  term,  nlp,  in  the  central  limit  theorem  for  Ji(P,tp)  can  be  re¬ 
placed  by  fi,  the  desired  quantity.  The  ability  to  replace  p.tp  by  p.  depends  on  the  relative  growth 
rates  of  P  and  tP,  as  well  as  moment  conditions  on  Xt]  and  t,j.  If  the  number  of  processors  P  grows 
too  quickly  with  respect  to  the  time  constraint  tP,  then  the  residual  bias  remains  significant  and  the 
central  limit  theorem  cannot  be  used  to  form  confidence  intervals  for  p.  We  next  give  precise 
conditions  under  which  the  desired  central  limit  theorem  is  obtained. 

Proposition  3.6 

Under  the  same  conditions  as  in  Proposition  3.5, 

JPtp(p(P,tp )  -p)=>o  E[ti;]1/2  /V(0,1)  as  P-*  oo, 


provided  either: 

1.  E[Uij|r,/]<oo  and  P  =  0(tp2/c~]),  or 

2.  E[  I  Ay  |  e0T|1]  <  oo  for  some  9  >  0,  and  P  =  O(e20tpjlp). 

Proof:  For  part  (1),  since  Proposition  3.5  is  valid,  by  Theorem  4.1  of  [2],  it  suffices  to  show  that 
-J Ptp  \ptp  -  p\  -*■  0.  But  by  (3)  of  Proposition  3.1,  tpk \ptp  —  p\  -+  0,  from  which  the  result  fol¬ 
lows.  The  proof  of  part  (2)  is  similar.  □ 

Note  that  the  maximum  allowable  number  of  processors  increases  (with  respect  to  the  time  con¬ 
straint)  with  increasing  moment  assumptions  on  T,r  For  example,  by  the  Cauchy-Schwarz  Ine¬ 
quality,  E[  |  A'y  |  tjj^]  <  E[Ay2] 1  ^EOy2*]  1  ^2  <  oo  for  k  =  2  under  our  base  assumptions  in 
Proposition  3.5.  Thus,  under  these  base  assumptions,  we  require  P  =  0{tp  ),  or  equivalently, 
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tp  =  a2(P'^3)  (a  sequence  ap  =  £l(bp)  if  there  exists  constants  C  and  P0  such  that  aP>  CbP  for  all 

p>p0)- 

We  next  consider  central  limit  theorems  for  £2 (P>  P\>  0-  The  primary  intent  of  this  analysis  is  to 
show  that  the  the  growth  restrictions  (l)  and  (2)  of  Proposition  3.6  can  be  loosened  considerably 
(since  /I2(P,  P\,  t)  is  unbiased)  even  if  P,/P  is  very  small.  Define  /(,{/)  =  XnI(rn  >  t). 

Proposition  3.7 

1.  Under  the  same  conditions  as  in  part  (3)  of  Proposition  3.4, 

JN(p2(P,  pu  0-p)=>  aE[  ^fl2N(0,l)  as  /  -» oo. 

2.  Let  P,andP->oo  in  such  a  way  that  P,/P  =  a  for  some  a  (0<«<1).  Let 

(t)  =  o2(t)  -2  fj.t  E[/?,(f)]  +  Var[P((f)]/a.  Under  the  same  conditions  of  part  (1)  of  Proposi¬ 
tion  3.4 

JPifaiP’  P\ .  0  -  p)  =*  ^(0  m  1)  as  P  -*  00. 

3.  Let  tp ,  P,  and  P  -*  00  in  such  a  way  that  P,/P  =  a  for  some  a  (0  <  a  <  1).  Assume  the  con¬ 
ditions  of  Proposition  3.5.  Then 

JPrp{a2{P,  P„  tP)  -n)=>  a  EO,,]1'2  N(0,l) 

4.  Let  tP ,  P |  and  P  -» 00  in  such  a  way  that  P,/P  =  aP  -»  0.  Assume  the  conditions  of  Proposi¬ 
tion  3.5.  Then 

(&(/>,  P, ,  tp)  -  m)  -  a  EtjJ12  N{ 0, 1) 
provided  either: 

a.  E[T^  -rj-]  <00  and  P/P,  =  0(tPk~{)  for  some  k  >  1,  or 

b.  E[Zj  e°  T"]  <  cc  and  P/P,  =  0(eetpltP)  for  some  6  >  0  . 

Proof:  For  part  (1),  Because  of  Equation  3.3  and  (3)  of  Proposition  3.4,  the  result  will  be  true 

provided  1  Rt(t)  |  =>  0  for  /  =  1, ,  P,.  But  P {-Jt  I  /?,(/)  I  >  ^  P{t„  >  0  -»  0  as  /  -»  00.  For  part 

(2),  define  Y,  =  aX^N,  (t))  +  Rt(t),  Z,  =  ( 1  -  «)«■  (0),  7=1  YiIP] ,  and  Z  =  2  Z,/(P  -  P,). 

_  _  1=1  /=  P,  +1 

Then  £2(P,  P,,  t)  -  Y  +  Z.  Let  /iy  =  E(TJ,  =  E[Z,]  and  notice  that  n  =  nY  +  t*z-  Thus 

■JP  (P2(P./V)  ~  p)  =  VP  (F  -  nY)  +  JP{Z  -  /r2).  (3.6) 

Let  <7  y  =  Var[  K;]  and  <r|  =  Var[Z,]  =  (1  -  a.)2o2(t).  By  the  ordinary  central  limit  theorem, 
yfaP  (7  -  nY)  =>  oyN(0,\)  and  7(1  -  a)P  (Z  —  nz)  oz  N[0,\).  Since  Y  and  Z  are  inde¬ 
pendent,  the  convergence  in  distribution  occurs  jointly  and  the  result  will  follow  by  Equation  3.6 
provided  o\(t)  =  o\ja  +  ct|/(1  —  a).  But 
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(3.7) 


4  =  «2Var[^M  (0)]  +  Var[/?X03  +  2«Cov[A)(/V;  (0),  Rt(t) ]  . 

Since  Rl(t)Xi(Ni  (t))  =  0,  Cov[A'i(A/I  (f)),  /?,(?)]  =  —  Mr  E[/?,-(f)],  and  the  result  then  follows  by  simple 
calculations.  Part  (3)  similarly  follows  since  it  is  easily  shown  that  t^a\{tp)  ->  Efr^Jcr2.  For  part  (4), 
adopting  the  above  notation,  since  aP->0,  by  part  (1)  of  Proposition  3.5, 
■fPtp  {Z  -  Mz)  ^  °  E[Tij]^2  /V(0,1).  The  result  will  then  be  true  provided  JPlP  |  F  -  fiy\  =>  0 
which,  in  turn,  is  true  provided  Va i[JP  tp  (F  -  My)  ]  =  P  tP  4  /  P\  0-  But 

P  t 

—  =  aPtPa2(tP)  -  2tPn[pE[Rl{tP)']  +  ( tP  P I  P, )  Var[P,(/p)]  .  (3.8) 

The  first  term  on  the  right  hand  side  of  Equation  3.8  converges  to  0  by  Equation  3.4.  For  the 
second  term,  ntp  —>  p  and  f/>E[P((tp)]  -»  0  (provided  E[|  <  oo).  For  the  third  term,  under 

assumption  (a),  tPk  Var^/tp)]  ->  0  so  (fpP/P,)  Var[^(lP)]  ->  0  provided  tpP/P,  =  0(tpk),  i.e., 
P/P,  =  0(//"‘).  The  result  similarly  holds  under  assumption  (b).  Therefore, 
Var[Vp77(F-My)]->0  as  required.  □ 

Note  that  the  central  limit  theorem  for  m,(P,/p)  puts  no  restrictions  on  the  relative  values  of  P  and 
tP.  However,  in  order  for  JL{P,tP)  to  have  a  properly  centered  central  limit  theorem,  a  minimum 
growth  rate  for  tP  with  respect  to  P  is  required.  In  contrast,  the  central  limit  theorem  for 
P2(^>  1p)  places  no  direct  restrictions  on  the  relative  values  of  P  and  tp,  but  rather  requires  a 

minimum  growth  rate  for  tP  with  respect  to  the  ratio  P/P,.  Note  also  that,  for  fixed  t,  a2(t)  is  a 
decreasing  function  of  a  =  P,/P.  Thus  there  is  some  variance  inflation  by  not  pre-selcting  all  of  the 
processors  (a=  1).  However,  part  (4)  of  Proposition  3.7  shows  that  this  variance  inflation  disap¬ 
pears  asymptotically  provided  a  =  aP  does  not  approach  zero  too  quickly  with  respect  to  tP. 

In  order  to  obtain  bia>  expansions  for  certain  nonlinear  functions  of  the  various  point  estimators, 
we  need  to  establish  uniform  integrability  and  moment  convergence  of  the  point  estimators 


Proposition  3.8 

Under  the  same  conditions  as  in  Proposition  3.5, 

1.  (%/^p(mi(7Vp)-m))2  is  uniformly  integrable  as  P oo,  and 

E[(VP4 (  M,(P,fp)  -  n  ))2]  =  a2E[Tij]  . 

2.  If,  in  addition,  either  conditions  (1)  or  (2)  of  Proposition  3.6  hold,  then 
[ytP’p  ( F(P>0>)  -  4  ))2  is  uniformly  integrable  as  P  -» oo,  and 

lim  E[(7P4(4(P,tp)-4))2]  =  ^2E[Tij]  . 
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3.  Under  the  same  conditions  of  Proposition  3.7,  parts  (3)  or  (4),  (jPtp  ( /r2(^>  tP)  —  p  ))2 
is  uniformly  integrable  as  P  -»  oo,  and 
jim  E .[(Vfy  (  &(/>,  />,.  '/>)  -  M  ))2]  =  cr2E[Tij]  . 

Proof:  For  part  (1),  by  Proposition  3.5  (JPtp  ( 'pfP,tP )  —  p  =>  <72E[-Tjj]yV(0,l)2.  Furthermore 

E  [(V^(?,(/V,>)-m))2]  =  PtpVar[p,(P,/P)] 

(3.9) 

=  ^Vp)  ^E[Tjj] 


by  Equation  3.4.  Therefore,  the  result  follows  by  Theorem  5.4  of  [2],  For  part  (2), 
P/pE[(fZ(P,tp)-M)2]  =  PtP(var[)I(P,tp)]  +  (ptp-pf) 


—  lp  a2(lp)  +  P  lp(^tP~  M )2 


(3.10) 


The  Erst  term  on  the  right  hand  side  of  Equation  3. 10  converges  to  a  E[t,,]  by  Equation  3.4,  while 
the  second  term  converges  to  zero  by  the  same  argument  as  in  the  proof  of  Proposition  3.6. 
Combining  this  with  the  convergence  in  distribution  of  Proposition  3.6.  yields  the  result.  The 
moment  convergence  for  part  (3)  was  basically  established  in  the  proof  of  Proposition  3.7,  and  the 
result  then  follows  similarly.  □ 

We  next  state  multidimensional  versions  of  Propositions  3.5,  3.6  and  3.7.  These  will  also  be  needed 

in  the  next  section  for  nonlinear  function  estimation.  These  results  are  simply  shown  by  applying 

the  Cramer-Wold  device  (Theorem  7.7  of  [2])  to  Propositions  3.5  -  3.7.  We  require  some  notation. 

Let  Xy  =  (A'jj(l) . Xy (d))  be  a  d-dimensional  vector  valued  output  of  replication  j  on  processor  i 

and  let  p  =  , pj)  where  pa  =  E[A",j(a)]  and  define  Cab  =  CovfA^a),  3(^(6)].  We  now  define 

multidimensional  analogues  p(P,tP),  P](P,tP)  and  p2(P,Putp)  of  Ji{P,tp),  P\(P,tp)  and 

P2(P,  P\>  lp)<  respectively,  as  follows.  Define  Xi(ri)  =  (Xi{n,  1) . Xi(n,d))  where 

—  n 

Xi(n,  a)  s  2-^ij (a)ln-  Component  a  of  p{P,tP)  is  then  defined  to  be 
7=1 

P 

miP,a)  =-j-  £W(0,«).  (3.11) 

(=1 


The  vectors  pfP,tP)  and  p2(P,  P\>  tP)  are  defined  analogously.  Let  N(0,A)  denote  a  multidimen¬ 
sional  normally  distributed  random  vector  with  means  0  and  Variance/Covariance  matrix  A. 


Proposition  3.9 
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Assume  that  the  conditions  of  Proposition  3.5  apply  to  each  component  of  Xy.  Then 
Jpi; ( -?)=>  E[Ty]1/2  N(OjC)  as  P  ->  oo. 

The  same  multidimensional  central  limit  theorem  holds  for  Ji(P,tp)  and  /?2C^>  Px,  tP)  provided  the 
conditions  of  Propositions  3.6  and  3.7  -pply  to  each  component  of  the  respective  random  vectors. 

We  conclude  this  section  with  a  discussion  of  the  formation  of  confidence  intervals  for  /i.  The 
central  limit  theorems  of  Propositions  3.4  -  3.7  can  be  used  as  the  basis  for  such  confidence  inter¬ 
vals,  however,  in  practice,  the  variance  terms  in  these  limit  theorems  are  not  known  and  must  be 
estimated.  As  usual,  this  presents  no  theoretical  obstacles  since  the  variance  terms  can  be  consist¬ 
ently  estimated  (see,  e.g.,  [7]).  However,  there  are  several  ways  the  variance  can  be  estimated  and 
the  appropriate  estimator  depends  on  whether  t-*  oo,  P  -*  co,  or  t  and  P  -*  oo.  We  therefore 
outline  the  appropriate  procedures  for  p.x(P,t).  Analogous  results  also  hold  for  ]l(P,t)  and 
? 2(P,  P i  -  0- 

First  consider  the  case  when  t  remains  fixed,  but  P  -*  oo.  Define 


p 

%  (P.0  =-jz~T  ZGWW)  ~  Pi(P,0)2  ■  (3-12) 

t=l 

Then  lim  ax(P,  t)-o  ( t )  a.s.  Combining  this  with  the  central  limit  theorem  of  part  (2)  of  Pro- 

P-*  oo  ___ 

position  3.4,  we  obtain  JP(fLx{P,t)  —  n)lax(P,  t)  =>  N( 0,1)  as  P  oo  (the  above  assumes  that 
^2  ... 

0  <  a  (t)<  oo).  From  this  central  limit  theorem,  an  approximate  (say)  90%  confidence  interval  for 
H  is  ?,(/>, 0  ±  1-645  ax{P,  t)lJF.  If  P  is  fixed  and  t  — » co,  define 


r(P,  0 


±Y- 

P  L>  7, 


i- 1  Ni  (0  j=  i 


A [(‘) 

2S 


(3.13) 


and 


%(P,t) 


*(P,  0 


p 


y 


(3.14) 


•  a2  2 

Then  lim  a-AP,  t)  =  E[Tjj]er  a.s.  (assuming  these  terms  are  finite)  and  therefore 

t— oo  1 

JN(Zx(P,t)  -p)fi2{P,t)  =>  /V(0,i)  as  /  -»  oo.  Finally,  if  P  -*  oo  and  tP  -*  oo,  then 
o\(P,  tP )  =>  E[Ty]<72  (assuming  E[  |  A'y  | 2+<5]  <  oo  and  E[Ty1+<5]  <  oo  by  Proposition  3.2).  There¬ 
fore,  Jp^(MP>'p)-p)I°2(P,‘p)  =*  Af(0,i). 
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4.  Nonlinear  Function  Estimation 


In  this  section,  we  consider  estimating  a  real  valued  nonlinear  function  g(fi)  by  either  g(]i{P,tp)), 
giffP.tp))  or  g(u2(P,  Pj,  tP)) .  This  problem  arises  in  many  contexts,  e.g.,  in  variance  estimation 
where  XtJ  =  (Xiy  (i.e.,  Xls(2)  =  A'ij(l)  )  and  gift)  =  n2  ~  Mi-  Another  application  is  steady  state 
estimation  in  regenerative  simulations  in  which  one  is  interested  in  estimating  ratios  of  the  form 
gift )  =  fi\lft2.  We  will  only  consider  situations  in  which  tp,  Pl  and  P  all  -*  oo.  First  note  that  since 
p(P,tp)  =*  ft,  ftx(P,tp)  =>  ft  and  jt2(P,  P\,tp)  =•  ft  (under  the  minimal  moment  conditions  given  in 
Section  3),  then  g(]i(P,tp ))  =>  gift),  g(jlx{P,tp))  =>  gift)  and  g(u2iP,  P, ,  tP))  =>  g{/t)  provided  that  g 
is  continuous  at  ft. 

Define  ga  =  dg!dxa  I  x=u ,  and  Gab  =  d2gldxadxb\x_ft.  Let  Ck(ft)  be  the  set  of  functions  having  finite 
continuous  derivatives  of  order  j  for  j  =  0, ...  ,k  in  a  neighborhood  of  ft.  We  next  use  the  standard 
technique  of  combining  central  limit  theorems  with  Taylor  series  expansions  to  obtain  central  limit 
theorems  and  bias  expansions  for  g(Ji(P,tp)),  g(ft\(P,tp))  and  g(fi2{P,  Pj ,  tp))  (see,  e.g.,  Chapters 
27  and  28  of  [6]  or  Chapter  2  of  [23]). 

Proposition  4.1 

Let  g  e  C\i}t)  and  assume  the  conditions  of  Proposition  3.5  apply  to  each  component  of  X,j.  Define 

d  d 

*1  =  ECtij]  YjTjga  Cab8b 

a=l  b=\ 

2 

and  assume  0  <  a1  <  oo.  Let  tp  and  P  -*  oo. 

1.  If  P,/P  =  «  for  some  a  (0  <  a  <  1). 

Jp~^tg(?2(p> Pu  ‘ p ))  -  gw )  ~  mo  ■ 

2.  JPi;(g(Ji(P,tp))  -  g(ftlp))  ~  oxN( 0,1). 

3.  If  conditions  (1)  or  (2)  of  Proposition  3.6  apply  to  each  component  of  X^,  then 

yfPb(g&(p,tp))  -  *oo)  =*  ^mo- 

4.  If  Px/P  =  aP  -*  0  and  conditions  (a)  or  (b)  of  part  (3)  of  Proposition  3.7  apply  to  each  com¬ 
ponent  of  Xjj,  then 

y/PbW?liP.Pl.*p))-gM)  =•  ^(0,  !)• 

Proof:  For  simplicity  of  notation,  we  will  assume  that  d  —  1  (and  use  the  notation  of  Section  3). 
We  will  only  establish  results  (2)  and  (3).  the  other  cases  and  the  multidimensiona’  versions  can 
be  shown,  without  complications,  along  similar  lines.  Using  a  first  order  Taylor  series  expansion, 
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write  g(]I(P,tp))  =  g(ntp )  +  g'(Zp)  ( Ji(P,tp )  -  fi[p)  where  is  on  the  line  segment  between  p.lp 
and  Ji(P,tp).  Since  ntp  =>  n  and  Ji{P,tP)  =>  fi ,  £,P  =>  \i  and  therefore  g'(c,P)  =>  g'(n).  Therefore,  by 
Proposition  3.5, 

JP^(g(P(Pjp))  ~  g(ntp))  =  g’(tr)  (rtPJp)  -  ntp) 

(4.1) 

For  part  (3),  do  the  Taylor  series  about  n  rather  than  ntp  and  apply  the  central  limit  theorem  of 
Proposition  3.6.  □ 

Note  that  part  (1)  of  this  Proposition  also  applies  to  }t\(P,tp)  since  /r2(/J,  P\,  tp)  =  Pi(Pjp)  for 

«  =  !(/>,  =  py 

We  next  turn  to  bias  expansions.  These  can  be  established  under  a  broad  variety  of  moment  as¬ 
sumptions  and  regularity  conditions  on  the  function  g.  For  example,  in  the  case  of  a  single 
processor,  expansions  for  E[g(X(A,I  (/)))]  and  E[g(X(;V;  (l)))]  were  derived  in  [13]  under  the  as¬ 
sumption  that  the  function  g  is  bounded  a.s.  (and  g  e  C2(u),  plus  certain  moment  assumptions). 
In  [16],  the  expansion  for  E[g(X(^  (t)))]  was  shown  to  be  valid  provided  that  g  is  bounded  by  a 
polynomial  of  degree  r  for  any  r >  0,  i.e.,  lg(X((V(- ( l ))) |  <A+B ||X(A,i- (t))  -  n\\r  (and  g  e  C2(/i), 
plus  somewhat  different  moment  conditions).  This  will  be  true,  e.g.,  provided  g  has  bounded  par¬ 
tial  derivatives  of  order  r.  For  the  simple  case  of  a  function  of  a  mean  of  a  deterministic  number 
of  i.i.d.  r  v 's,  Chapter  2  of  [23]  contains  such  bias  expansions  provided  g  has  bounded  partial  de¬ 
rivatives  of  order  r  (r>  3)  and  the  r'th  moments  of  the  underlying  r.v.'s  are  finite.  In  the  interest 
of  simplicity,  we  will  state  the  results  under  conditions  which  make  the  proof  both  simple  and 
transparent.  Define 


B 


E[t„] 

2 


fls:  1  h=) 


■" ab 


(4.2) 


Proposition  4.2 

Let  g  e  C2(ji)  and  assume  that  all  of  g's  partial  derivatives  of  order  2  are  bounded  everywhere. 
Assume  the  conditions  of  Proposition  3.8  apply  to  each  component  of  Xjj.  Suppose  that 
tP  and  =  P\{P)  -» oo  as  P  ->  oo. 

1.  E[g(7I(/VP))]  =  g(jt)  +  BI(P  tP)  +  o(l  / Ptp)  as  P  -4oo, 

provided  either: 
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a.  E[  I  A”jj  |  Ty*]  <  oo  and  P  =  0(tPk  !),  or 

b.  E[  I  Xls  I  A]  <  co  for  some  0  >  0,  and  P  =  O(e0tpl tP). 

2.  E[g(/r,  (P, //>))]  =  g(M)  -r  P/(PtP)-fo(I/P  tP)  . 

3.  E Zg&2(P,PvtPm  =  Z(t<)  +  Bl{PtP)~o(\lPtp)  . 

Proof:  We  will  again  assume  that  d  =  1  and  show  the  result  for  g(]i(P,tp)).  Using  a  second  order 
Taylor  series  expansion,  we  have 

Ptpwm'p))  -  gw)  =  g'{n)Ptp{n{p,tP)  -  n)  +  - ^-Ptpimtp. )  -  m)2  (4.3) 

where  is  on  the  line  segment  between  ft  and  Jl(P,tp).  The  expectation  of  the  first  term  on  the 
right  hand  side  of  Equation  4.3  equals  g'(u)  P  tP  E[Tn  ;  /;1  >  t/>]  which  converges  to  zero  under 
assumptions  (a)  or  (b).  Again  c,P  =>  n,  so  the  second  term  on  the  right  hand  side  of  Equation  4.3 
converges  in  distribution  to  (1/2)  g”(fi)  o  Ef-r^]  A'(0,1)  by  Proposition  3.5,  so  we  will  be  done  if 
g"{c,P)  P  tp  ( Ji(P,tp)  -  fi  )2  is  uniformly  integrable.  But  this  follows  since  there  exists  a  finite 
constant  M  such  that  |^"(x)  |  <  M  for  all  x.  The  proofs  for  jl^Pjp)  and  ^(P,  P\ ,  tP)  are  similar, 
except  that  the  expectations  of  the  first  order  term  in  the  Taylor  series  expansions  are  identically 
equal  to  zero  since  E[#i(/V/>)]  =  E[/x2(/,1  Px,  tP )]  =  /x.  □ 

Proposition  4  2  states  that  (under  suitable  regularity  conditions  on  g  and  growth  restnctions  on  P, 
Pj  and  tp)  the  bias  goes  to  zero  as  a  constant  divided  by  the  total  simulation  effort  P  x  tP. 
Jackknifing  (see  [26])  is  one  method  that  can  be  used  to  mitigate  bias  due  to  nonlinearity  effects. 
In  [16],  the  jackknife  is  explored  in  the  setting  of  a  single  budget-constrained  processor.  We  intend 
to  study  the  budget-constrained  jackknife  estimator  in  the  multiprocessor  setting  in  future  work. 
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5.  Completion  Time  Analysis 


In  this  section  we  analyze  properties  of  the  random  completion  time  associated  with  ^{P,  P\,  0  (or 

£](/>,/)  with  P  =  />!).  Let  F(x)  =  P{  -qj  <  * } ,  F(x)  =  1  -  F(x)  and  Mn  =  maxfa, . rnl).  Let 

Tt{t)  =  max(-q,,  /) :  T^t)  is  the  completion  time  on  processor  /  (1  <  /<  P{)  given  the  budget  con¬ 
straint  t  (actually  budget  guideline  is  a  better  term).  Then 

M(t,  P,)  =  max(T,(0)  =  max(t,  A/d)  (5.1) 

1  <i<P, 

is  the  completion  time  of  the  simulation  experiment.  We  are  also  be  interested  in  the  total  proc¬ 
essing  time  on  the  pre-selected  processors, 

T(t,P0  =  £  7t(0 ,  (5.2) 

/=  l 

and  in  the  number  of  “active”  processors  at  time  l. 

Pi 

A(t,  P j)  =  X!  /(T‘1  > '> '  (5-3) 

(=1 

i.e.,  A(t,  P,)  is  the  number  of  processors  that  are  required  to  simulate  beyond  time  t. 

Since  T(t,  P{)  and  A(t ,  Pj)  are  just  sums  of  i.i.d.  r.v.'s,  their  limiting  behavior  (as  P j  ->  co)  can  be 
described  by  standard  strong  laws  and  central  limit  theorems.  Similarly,  since  M(t,  P,)  is  basically 
a  maximum  of  i.i.d.  r.v.'s,  its  limiting  behavior  can  be  derived  from  results  in  extreme  value  theory 
(see,  e.g.,  [22]).  We  will  give  a  sampling  of  such  results. 

Consider  first  the  number  of  active  processors  A(l,  P\).  Note  that  A(t,  P|)  is  binomially  distributed 
with  parameters  F(t)  and  P\.  Therefore,  if  t  remains  fixed  and  P,  co,  then  A(t,  P[)/P,  obeys  a 
strong  law  and  is  approximately  normally  distributed  with  mean  F(t)  and  variance  F(t)F(t)/P\. 
However,  if  Pj  and  tp  -*  co  in  such  a  way  that  P\F(tP)  -*■  a  (0  <  a  <  oo),  then  A(tP,  Pj)  converges 
in  distribution  to  a  Poisson  r.v.  with  parameter  a  (see  Section  V.5  of  [9]).  For  example,  if  r,j  is 
exponentially  distributed  with  rate  2  (F(t)  =  e~At)  and  tP  =  (l/2)ln(P,/a),  then  the  Poisson  con¬ 
vergence  is  obtained. 

Turning  now  to  the  total  processing  time  on  pre-selected  processors,  by  Equation  5.2,  if  t  remains 
fixed  and  Pj  -» co,  then  T(t,  P\)IP\  obeys  both  a  strong  law  and  central  limit  theorem.  Now  con¬ 
sider  the  behavior  as  tP~*  co.  Notice  that  T(tp,  P{]jP{tP  is  the  ratio  of  the  actual  computing  time 
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to  the  planned  computing  time  (on  the  pre-selected  processors)  and  (T(tP,  Px)  -  PxtP)IPx  is  the 
average  excess  (unplanned)  computing  time  per  processor. 


Proposition  5.1 

IfE[Tlj]<  oo  ,  then  as  tP  -*  oo, 

1.  |  T{tP,  P,)  -  P,  tP  l/P,  =>  0, 

2.  T(tP,Px)l(tPPx)  =>  1. 

0 

Proof:  For  part  (1),  since  \T(tp,Px)  -  Pxtp\  <  2  I  f(!p)  ~  *p  I, 

i-\ 

mntp.Pi)  -  Pi  tP  i/p,  >€}  <  e[ i T(tP, px)  -  px  tP  i]/(p,  t)  <  ec i Ti(tP)  -  lp  i ]/£ 


j  (x  —  tP)  dF(x)  /e  <  f  x  dF(x)  /e  . 

J0>  /P)  j{*>  tP) 


(5.4) 


But  the  right  hand  side  of  Equation  5.4  converges  to  0  since  E[t,.]  <  oo.  The  proof  of  part  (2)  is 
similar.  □ 

Notice  that  the  convergence  in  Proposition  5. 1  is  still  obtained  even  if  P,  -*  oo  along  with  lP.  Thus 
for  large  tP,  no  matter  how  many  processors  there  are,  the  ratio  of  actual  computing  to  planned 
computing  converges  (in  probability)  to  one. 

We  now  turn  to  the  completion  time.  We  assume  that  F  is  in  the  (maximum)  domain  of  attraction 
of  a  (finite)  extreme  value  distribution,  i.e.,  there  exist  constants  an  and  bn  and  a  finite  r.v.  X  such 
that  an(Mn  -/>„)=•  X  .  For  example,  for  the  exponential  distribution  with  parameter  2,  an  =  2, 
bn  =  ln(n)/2  and  PfA’*  <  x}  =  exp(  -  e~x ).  In  addition,  in  this  case  there  is  a  well  known  closed 
form  expression  for  E [A/„]  :  E[A/„]  =  (1/2)  //„  where  Hn  =  2(1/0  «  ln(/?). 

i=i 


Proposition  5.2 

If  an(Mn  -  bn)  =>  X' ,  then 

1.  If  lim  ap  (tp- bP)  =  -  co,  then 

P]  -*oo  1  1 

apx  (M(tP,  P,)  —  bp t)  =>  X  . 

2.  If  lim  aP  (tp  —  bP  )  =  <x  (  —  oo  <  a  <  oo),  then 

P1  -»oo  1  1 

%  (M(fp.  Pi)  “  fy)  ^  max(«,  x') . 

3.  If  lim  aP  (tP  -  bP  )  =  +  oo,  then 

P|  -»oo  1  1 

P{M(tP,  P,)  =  tp)-*\. 
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Proof:  For  part  (1),  we  will  show  aPi  ( M{tP ,  P ,)  —  MPi)  =>  0. 

P{  I  ap](M(tp<  P\)  —  Mp)  |  >  e)  <  P{MPi  <  tP]  =  ?{aPi(MPi~  bPi)<aPi(tPi~  bPi)} -*Q.  (5.5) 

Part  (2)  follows  from  the  continuity  of  the  maximum  operator.  For  part  (3) 

?{M(tP,  P\)  =  tp}  =  P {MPi  <  tP }  =  P {aPi(MPi  -  bpx)  <  aPyPx  —  bpfi  ->  l.D  (5.6) 

Notice  that  part  (1)  of  Proposition  5.2  will  typically  apply  if  tP  remains  fixed,  in  which  case 
M(tP,  P,)  inherits  the  limiting  distribution  of  MP. 

We  next  consider  the  combined  implications  of  Propositions  3.7  and  5.2  in  a  particular  case. 
Suppose  T,j  has  an  exponential  distribution  with  mean  1.  Let  lP  =  /?  ln(P,).  For  /?  <  1,  part  (1)  of 
Proposition  5.2  applies,  so  E \M{tP,  P,)]«  In (P,).  For  jS  >  1,  part  (3)  of  Proposition  5.2  applies,  so 
E [M{tP,  P,)]wrp  =  jS  ln(P]).  Now  if  Xis  is  bounded,  then  the  moment  condition  of  part  (4)  of  Pro¬ 
position  3.7  is  satisfied  for  0  <  1.  Thus,  in  order  for  £2(P,  P\,  ip)  to  obey  the  proper  central  limit 
theorem,  we  must  have  P/P,  =  O {e9tpltP),  i.e.,  P  <  APX 1+0/3 /{p  ln(P,))  for  some  constant  A  .  Now 
let  P  =  P, 1+0/1  l(p  ln(P,))  and  suppose  we  were  to  use  Pi(P.tp)  (i-e.,  we  insist  on  at  least  one  repli¬ 
cation  on  all  processors).  The  expected  completion  time  associated  with  JlfPjp)  then  grows  like 
(1  +  Oft)  ln(P,)  —  ln(/?  ln(P,)).  For  #«/?«  1,  the  expected  completion  time  using  J I,(P,tp)  is  then 
nearly  twice  that  when  using  £2(P,  P\>  tp)-  However,  for  Owl  and  p>  1,  the  ratio  of  the  expected 
completion  times  is  approximately  (1  +  P)IP~  1  for  large  p.  Note  that  in  this  case,  the  difference 
in  expected  completion  times  basically  grows  like  ln(P,). 

The  above  analysis  has  considered  non-degenerate  limits  for  A(tP,  P,),  T(lP<  P,)  and  M(tP,  P,). 
Clearly,  if  lP  -*  oo  very  quickly,  then  A(tP,  P,)  =  0,  T(tP,  P,)  =  tP  P,  and  M(tP,  P,)  =  tP  with  high 
probability  (see,  e.g.,  part  (3)  of  Proposition  5.2).  We  conclude  this  section  by  analyzing  the  con¬ 
vergence  rates  of  these  r.v/s  for  large  tp  in  more  detail. 

Proposition  5.3 

Let  E[tjj+m+2]  <  oo  for  some  k  >  0  and  m>  0  and  let  P,  =  0{iPk)-  Then  as  tP -*  oo  : 

1.  lim  tPm+2  E[/l(fp,  P,)]  =  0 . 

P\  —oo 

2.  lim  tPm+]  E [T{tP,  P,)  -  (PP{\  =  0  . 

P;  —OO 

3.  lim  tPm  Var [T(/f,  P,)]  =  0  . 

Pj  -<oo 

4.  lim  tPml2  E P,)  -  tf]  =  0  . 

P,  ->oo 
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Proof:  For  part  (1), 


tPm+2  E[A(tP,  />,)]  =  tPm+2PxF(tP)  <  Ctpm+k+2F(tP)  -  0  (5.7) 

by  exercise  15  on  page  46  of  [4].  For  part  (2),  arguing  similarly  as  in  Proposition  5.1, 
tpm+'E[\T(tP,Px)  -  Pxtp\-\<tpm^PxE[\Utp)  -  tP\]<  CtPm+l+kE[  |  Ti(tP)  -  ^  13-0  (5.8) 


by  Corollary  2.8  of  [13].  For  part  (3),  Var[T(fP>  P,)]  =  Var[Tt(rP)3  and  the  result  then  follows 
along  similar  lines  by  showing  that  tpm+;:Var[Ti(/p)]  -*  0.  For  part  (4) 


tpml2ElM(tP,Px)  -  tp]  <  tPm'2  E[Tt(tP)  -  ip]  +  Var[r,(tP)]1/2 


by  the  global  bound  (Equation  4.2.6)  on  page  59  of  [8].  The  result  then  follows  similarly.  □ 

The  analog  of  Proposition  5.3,  under  moment  generating  type  assumptions  on  T,Jt  is  stated  below. 
Its  proof  is  essentially  identical  to  that  of  Proposition  5.3. 


Proposition  5.4 

Let  EfrjV*1']  <  do  for  some  6  >  0  and  let  P,  =  0(e°'tp)  where  0  <  0,  <  0.  Let  02  -  0  -  0X.  Then 
as  tp  -*  oo  : 

1.  Urn  tP2eei,pE[A(tp,Px)-]  =  0. 

Pl  —oo 

2.  lim  tP  e 02  E[T(tP,  P,)  -  tPP{]  =  0 . 

->oo 

3.  lim  e°2lp'Vai[T(tp,  P|)]  =  0. 

P]  ->oo 

4.  lim  e°2!pl2E[M(tp,Px)-tp]  =  0. 

P,  -*oo 
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6.  Summary 


This  paper  has  analyzed  properties  associated  with  a  simple  yet  effective  way  to  exploit  parallel 
processors  in  discrete  event  simulations:  averaging  the  results  of  multiple,  independent  replications 
that  are  run,  in  parallel,  on  multiple  processors.  We  assumed  that  there  was  a  CPU  time  constraint, 
t,  on  each  of  P  processors.  However,  we  showed  that,  unless  the  replication  lengths  are  bounded, 
one  must  be  willing  to  simulate  beyond  any  fixed,  finite  time  t  on  at  least  some  processors  in  order 
to  always  get  the  right  answer.  The  statistical  properties  of  a  variety  of  estimators  were  then  ex¬ 
plored.  Limit  theorems  were  obtained  for  these  estimators  when  either  the  number  of  processors 
or  the  CPU  time  constraint  approaches  infinity.  In  addition,  central  limit  theorems  and  bias  ex¬ 
pansions  were  obtained  when  both  of  these  parameters  simultaneously  get  large.  In  this  case,  rel¬ 
ative  growth  rates  for  P  and  t  were  determined  in  order  for  the  estimators  to  have  properly  centered 
central  limit  theorems.  For  example,  if  one  insists  on  never  simulating  beyond  time  t  (and  using 
the  estimator  then  P  must  grow  rather  slowly  with  respect  to  t.  On  the  other  hand,  one 

can  pre-select  P1  (0  <P\<  P)  of  the  processors  and  simulate  beyond  time  t  on  a  pre-selected 
processor  if  and  only  if  no  replications  have  yet  been  completed  on  that  processor.  This  results  in 
the  unbiased  estimator  /I2(P,  P, ,  t).  While  one  can  pre-select  an  asymptotically  negligible  number 
of  processors  (i.e.,  P]/P  ->  0),  this  places  growth  restrictions  on  the  relative  values  of  t  and  P,/P  that 
may  be  difficult  to  identify  in  practice. 

A  sensible  practical  approach  is  to  pre-select  a  fixed  fraction  a  of  the  processors.  While  there  is 
some  variance  inflation  for  large  P  and  finite  values  of  t  (as  opposed  to  pre-selecting  all  the 
processors),  this  inflation  will  be  modest  provided  that  t  is  not  too  small  with  respect  to  the  dis¬ 
tribution  of  t,j  and  a  is  not  too  close  to  0.  As  t~*  oo,  there  is  no  (asymptotic)  variance  inflation. 
In  addition,  for  large  values  of  t,  provided  Px  is  not  too  large,  T(t,  P,),  the  simulation  experiment's 
completion  time,  is  equal  to  t  with  sufficiently  high  probability  so  that  Var[T(t,  P,)]  -» 0  quite 
rapidly. 


6.  Summary 
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Abstract 

This  paper  analyzes  properties  associated  with  a  simple  yet  effective  way  to  exploit  parallel 
processors  in  discrete  event  simulations:  averaging  the  result*,  of  multiple,  independent  replications 
that  are  run,  in  parallel,  on  multiple  processors.  We  focus  on  estimating  expectations  from  termi¬ 
nating  simulations,  or  steady  state  parameters  from  regenerative  simulations.  We  assume  that  there 
is  a  CPU  time  constraint,  t,  on  each  of  P  processors.  Unless  the  replication  lengths  are  bounded, 
one  must  be  willing  to  simulate  beyond  any  fixed,  finite  time  t  on  at  least  some  processors  in  order 
to  always  obtain  a  strongly  consistent  estimator  (as  the  number  of  processors  increases).  We 
therefore  consider  simulation  experiments  in  which  t  is  viewed  as  either  being  a  strict  constraint, 
or  a  guideline,  in  which  case  simulation  beyond  time  t  is  permitted.  The  statistical  properties,  in¬ 
cluding  strong  laws,  central  limit  theorems,  bias  expansions  and  completion  time  distributions,  of 
a  variety  of  estimators  obtainable  from  such  an  experiment  are  derived.  We  propose  an  unbiased 
estimator  for  a  simple  mean  value.  This  estimator  requires  pre-selecting  a  fraction  of  the  processors. 
Simulation  beyond  time  t  may  be  required  on  a  pre-selected  processor,  but  only  if  no  replications 
have  yet  been  completed  on  that  processor. 
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